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Abstract 


A novel general approach to round-off error analysis using the error complexity concepts is 
described. This is applied to the analysis of the Gaussian Elimination and the Gauss-Jordan scheme 
for solving linear equations. The results show that the two algorithms are equivalent in terms of 
our error complexity measures. Thus the inherently parallel Gauss-Jordan scheme can be imple- 
mented with confidence if parallel computers are available. 


*This work was supported in part by the Space Act Agreement C99066G while the author was visiting ICOMP, NASA 
Lewis Research Center. 



1 . Introduction 


In past decades, the backward error analysis of Wilkinson[3] has enabled us to gain much insight 
into the behavior of round-off error propagation in numerical algebraic algorithms. One is required, 
naturally, to possess varying degrees of mathematical sophistication in order to apply his approach 
to the analysis of numerical algorithms on hand. 

The algorithms of Gaussian Elimination (GE) and Gauss-Jordan reduction (GJ) arc well known 
in solving linear equations. The numerical stability of Gaussian Elimination with partial pivoting 
is shown in [3] , and the stability of Gauss-Jordan reduction is shown in [4] , all using Wilkinson's 
approach. The results show' that in Gaussian Elimination the computed solution x of a given sys- 
tem Ax = b is the exact solution of some neighboring system (A -f E)x = b with a reasonable bound 
for /:, whereas in Gauss-Jordan reduction the computed solution x is such that each component 
of x belongs to the exact solution of a different neighboring system. 

In this paper we extend the error complexity concepts in [2] to division operation and apply 
it to the analysis of the Gauss-Jordan and Gaussian Elimination algorithms to show from an al- 
ternative point of view that the tw'o algorithms are indeed equivalent in terms of our error com- 
plexity measures. Thus one can implement the inherently parallel Gauss-Jordan reduction 
algorithm in with confidence for parallel computers. Some preliminary results are presented in 
Section 2. They are applied in Section 3 to the error complexity analysis of the two algorithms for 
solving linear system of equations. 

2. Some Preliminary Results 

Consider the simple problem of evaluating the 3 by 3 determinant 

a \\ a n (7 i3 

(2.1) det(/l)= a 2\ a 22 a 23 . 

a 3\ a 32 a 33 

A straightforward algorithm would evaluate (2.1) as 

det(j4) = ^ii(«22 a 33 — a 23 a n) + a n( a 23 a 3\ ~ a 2\ a 33) + a ]3( a 2\ a 32 ~ a 22 a 3])- 
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On the otherhand, one can also express (2.1) as 


(2.2) det (A) = a u a' 22 a" 33 

where 


a 22 ~ <*22 


u 2\ 

a !l 


a 3\ 

J 12, a .13 = ( a 33 ~ a \ 3) ~ 


1 i CL'* i 

(■ a 32 - ^77 a l2)( a 23 - -577 a l3)- 


22 


Expanding (2.2), we have 


#21 

det (.A) = a u (a 22 - — 


a 31 

a \2)( a 33 “ ~^7 3) ~ a \\( a 32 


a 3\ 

a ll 


a \2)( a 23 


a 2\ 

a \\ 


a \3) 


Which is equivalent to (2.1) once the term 


d 2\ 


“31 


«11 a 12 *11 


a \3 


is cancelled out. This is of course unlikely in actual computation due to round-ofT errors. Thus 
the latter approach is more likely to incur round-ofT errors during the computational process. 

Our approach in the error analysis of different algorithms designed for the same problem strives 
to provide such information as the number of round-ofT error occurrences as well as the extra terms 
created during the computational process for easy comparison and at the same time enable us to 
gain more insight into the details of how each algorithm works. 

Given a normalized floating-point system with a r-digit base /? mantissa, the following equations 
can be assumed to facilitate the error analysis of general arithmatic expressions using only 
+ , — , x , or / operations[3]: 

(2.3) Jl(x#y) = (x#j;)A, # e { + , - , x , /} 


where 



4- B l 1 for rounded operations 
I A | < 1 + k, u< < 2 

p ] ~ l for chopped operations 


and x and y are given machine floating-point numbers and /7(.) is used to denote the computed • 

floating-point result of the given argument. We shall call A the unit A -factor. 

In general one can apply (2.3) repeatedly to a sequence of arithmetic steps, and the computed 
result z can be expressed as 


(2.4) 


Mz n ) 


z- — = ' =1 


z d Mz d ) 
7= 1 


where each z nt or z dj is an exact product of error-free data, and A* stands for the product of k pos- 
sibly different A-factors. We should emphasize that all common factors between the numerator and 
denominator should have been factored out before z can be expressed in its final rationed form of 

(2.4) . Following [1], we shall henceforth call such an exact product of error-free data a basic term, 
or simply a term. Thus 2(z„) or 2(z rf ) is then the total number of such terms whose sum constitutes 
z„ or z d1 respectively, and a (z nt ) or o(z dl ) gives the possible number of round-off occurrences during 
the computational process. We define the following two measures: 

maximum error complexity: 

(2.5) a(z n ) = ma.x a(z ni ), a(z d ) = max cr(z di ) 

n \<i<A(z n ) y m) v \<]<?.(z d ) V 


cumulative error complexity: 


Mz d ) 

( 2 - 6 ) s ( z n) = S ^ d) S X! a(Z ^' 

j=l 
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Different algorithms used to compute the same z can then be compared using the above error 
complexity measures and the number of basic terms created by each algorithm. 


l ; or convenience we will use ch d (z) and ch n (z ) to represent the 3-tuples {X(z d ), o(z d ),s(z d )} and 
{/t (z„), < 7 (z„), s(z n )} , respectively, so that the computed z of (2.4) is fully characterized by 


ch(z) ee 


ch^z) ' 


The unit A-factor is then characterized by 


(2.7a) ■ 


c/i(A) = {1,1,1} = (A). 


One can also obtain easily using (2.5) and (2.6) that 


(2.76) 


ch{ A 1 ') = ch\ A) = { A}* = {!,/,/}. 


In division-free computations any computed z will have only the numerator part c\(z). The 
following lemma is useful in dealing with intermediate computed results: 


l^mma 2. 1 Given x and y with their associated ch n (x) and c/^(j>), 


(i) if z = xy, then 


ch n (z) = ch n (x)ch n (y) = ch n (y)ch n (x) 

= mx n )?.(y n ) % o(x n ) + o(y n ) f s(x n )X(y n ) + X(x n )s(y n )}, 


(ii) if z = x ±y, then 


ch n {z) = ch n (x) 4- ch n (y) = ch n (y) + ch n (x) 

= {X(x n ) + X(y n ), max(a(x n ),o{y n )) t s(x n ) + s(y n )} . 


Proof. The results can be obtained easily by expressing x and y as 



H*n) KVr) 


and applying (2.5), (2.6) and the definition of A(z n ) to find ch^z). Q.I .D. 


For general floating-point computations, we have the following lemma: 


1 emma 2.2 Given x and y with their associated 


e\(x) = <*(*„). ■*(*«)} d = ch n {y) = {A(y„), o(y n ), -r(y„)} 

ch^x) ~ {Mx d ), a(x d ), s(x d )j ’ C ’ chjy) ~ U(yj), o{y d ), s(y d )} 


(i) if z = fl(x ±y) and there is no common factors between x d and y d , then 


<\(z) _ ch^x)ch n (_r){A) 4- chJy)ch n (x){A} 
chjz) chj,x)chjy) 


where 


A(z n ) = Hx n )Hy d ) + My n )Mx d ), 

X{z d ) = X(x d )).(y d ), 

a(z n ) = 1 + max(<r(.v„) + o(y d ), o(y n ) + cr(x d )), 

°(zj) = °(x d ) + a{y d ), 

s{z n ) = X{x n )s(y d ) + l(y d )s{x n ) + /.{) ’ n )s{x d ) + X(x d )s(y n ) + X(z n ), 
s(z d ) = X(x d )s(y d ) + X(y d )s(x d )-, 


(ii) if z = fl(x x y) and there is no common factors between .v„ and y d or between y„ and x d , then 


c/i„(z) cA„(x)c/t„Q-’){A} 

eh p) ~ chJ^xjchjiy) 


where 
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Mz„) = 

Mzj) = 

<t(z„) = I + a(x„) + <<>’„), 

»(**) = a(x d ) + a(y d ), 

s{z n ) = ;.(x„)jO„) + / ) -(>'„)i(x /1 ) + A(z„), 

j(Zd) = 


(iii) if z = fl{xjy) and there is no common factors between x„ and y n or between x d and y d , then 


c/;„(z) ch n (x)chJy){A} 

ch^z) ch n (y)ch^x) 


where 


A(z„) = X(x n )X(y d ), 

).{z d ) = A(x d );.o„), 
ct(z„) = a(x n ) + a(y d ) + 1, 

<r(z d ) = ct(x^) + 

s(z n ) = /(x„).r(>’ rf ) + /(J^)v(x„) + X(z n ), 
s(z d ) = X(x d )s(y„) + X(y n )s{x d ). 


Proof, first we apply (2.3) to each case and obtain 


z =fl(x#y) = (x#y) A, # e { + , - , x , /}. 


The results can then be obtained easily by using Ixmmas 2.1 and 2.2. Q.E.D. 


3. Error Complexity Analysis of the Two Algorithms 


Given 


Ax — b 


where 
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a \l a \2 - 
a 2\ a 22 ■■ 

a l V 
a 2 V 

, h = 

*1 

b 2 


*1 

x 2 

a \l d X2 ■ 

•• a Si\ 

_ b N_ 

, A* — 

A, V 


it is desired to find a. We shall assume that A, b are error-free with ).{a^ = X(b) — 1 and pivoting 
is not necessary. For simplicity, the b vector is appended to A as the (;V + l)-st column of A. Hius 
initially the augmented matrix is such that 

n , fa Ui=j= 1. 

Ch{a ‘ j) jc, otherwise . 


where 


C\ = {/*i- <T i- r i} = c \ = fa> a \< - f i) = {1.0,0}. 


The use of c t to distinguish ch(a u ) from all others is to facilitate the task of identifying common 
factors as we shall see later. The two algorithms arc as follows: 


Algorithm GE. 


1. 




{begin Reduction to Triangular Form} 
for / = 1 to A' — 1 do 
for /c = z + 1 to A do 

ou =/?KR) 

for j = / + 1 to T 1 do 
% = /?(«*, ~a k , x a,,) 

{begin Back-Substitution) 

x n = fli a N,N+ i l a aw) 

for i — N— 1 downto 1 do 
for j = N downto / + 1 do 

=fK<*< % N+\ ~ a u X *j) 

X t 


Algorithm GJ. 

1. {begin Reduction to Diagonal Form) 
for /=1 to N do 

for k = 1 to N (except / ) do 
% =/KaJa „) 
for j = i T 1 to i\ + 1 do 

% =fl{% ~ <**, x a ,j) 

2. {begin Solvmg Diagonal System) 
for z = 1 to N do 
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*i =/%<**#* \K) 


A closer examination of the two algorithms reveals that the computed lower triangular part of 
the matrix A arc identical. The only difference between the two lies in the solution of the upper 
triangular systems: in Algorithm GT the back-substitution scheme is used, whereas in Algorithm 
GJ a forward -elimination scheme is used to reduce the upper triangular form to diagonal form for 
the final solution. Thus error-wise the Algorithm GJ is equivalent to the following modified one: 


Algorithm GJm. 

1. {begin Reduction to Upper Triangular Form} 
same as in Algorithm GT. 

2. {begin Reduction of Upper Triangular to Diagonal Form} 
for i — 1 to N — 1 do 

for k = 1 to / do 

a k,i + 1 ~ fli a k^\l a ,+ 1,/+|) 
for j = / + 2 to N + 1 do 
a k , =fl(a kl “ a ^\ x J 

3. {begin Solving Diagonal System} 
same as in Step 2 of Algorithm GJ. 


We first give a detailed analysis of Step 1 of both Algorithm GF and Algorithm GJm for 
N — 3. Applying (2.3) to Step 1, we obtain, after the first iteration for /= 1 is completed, the fol- 
lowing computed results: 


'll 

a l2 

a 13 

a l4 

' 1 1 

a' 22 

a< 23 

a 'u 

3 1 

a> 32 

a> 33 

^34 


where 


, _ a k\ A , , . . 2, a kj a \\ A ~ a k\ a \j A 

a k\ ~ a,, ’ a kj~ ( a kj A ~ a k\ a \j A ) = 


a 


1 1 


k = 2,3; j= 2,3,4. 


Applying Ixmmas 2.1 and 2.2 to the above equation, we obtain the following matrix for ch{a Lj ): 


c \ c \ c, 

q{A) C2_ <2_ C2 

C| c, c, c, 

c, {A} *x_ Sl_ Si 

Cl ^1 ^1 ^1 
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where 


c 2 = {/ 2 , cr 2 , s 2 ) =c 2 = {/ 2 , o 2 , ,? 2 } = c,c,{ A} + C,C,{A} = {1,1,1} + {1,3,3} = {2,3,4}. 

Note since a u is the common denominator of all newly computed results, hence c, is also present 
as the denominator of all computed items. Again c 2 is used to distinguish ch(a' n ) from the rest of 

ch(a’ kj ). 

Similarly after the completion of the second iteration for i = 2 of Step 1 in both algorithms, 
we have the following computed results: 


: 11 

a \2 

a 13 

a 

14 

2 1 

a' 22 

a> 23 

a ' 

24 

.3 I 

a 32 


a 

.34 


where 

„ d# 32 A 

32 — ; > 

a 22 

a “y = ( a 'y A - a" 32 a' 2j A 2 ), j = 3,4 

(<233^1 1 A — a 3 ] a\j& 3 )(a 22 ct\ i A — <J 2 ia, 2 A )A — (a 32 a^ , A — a 31 a 12 A ){a 2 ja \ , A — tf2i a i/A 3 )A 3 

( a 22 a 1 1 A — « 21 ^i 2 A 3 )a, , 

Note in the above expression, the term ls no * likely to be cancelled out in actual com- 

putation as expected in ideal computation. The c/z-matrix can be obtained from the above ex- 
pression as 

Cl 

Cl {A} 

C] 

where 

C 3 = c 3 = C 2 C 2 {A} + c 2 c 2 {A } 3 = (2,3,4} 2 [{ 1,1,1} + {1,3,3}] = {8,9,48}. 


Cl c, c, 

C2_ Sl iL 

ci c, r, 

c 2 {A} c 3 _Cj_ 

c 2 C. 2 C\ C 2 C, 


10 


In general we have the following theorem: 


Theorem 3.1 The c/z-matrix of the newly computed part of A, after the /- th iteration of Step 
1 of Algorithm GE is completed, is given as follows: 


ch(a u ) = Cj , ch(a Xj ) = q , 2 <j < N + 1 
c.{A) 

ch( a ki) — — ~ — , 1 < i < k < A 7 , 


<^(%) = 


c i + 1 
r=l 

c i+1 


for j = A' = / -h 1 


for other values of / + 1 < j \k < N + 1 


where 


9 +) = c ;+1 = 99 (A) + 99 (A) 3 , l <j < N - l. 


Proof. See Appendix I. 


Once we have the original system reduced to an upper triangular form, the rest of the steps in 
Algorithms GE and GJm can he applied. We have the following theorems: 


Theorem 3.2 The c/z-vector of the computed solution x using Algorithm GE is given as 


ch( Xi ) = g +1 "7 ((A) + (A) 3 )' V -'(A), 1 <i<N. 
c i c i+\ ’ C N 


Proof. See Appendix II. 


Theorem 3.3 The c/z-matrix of the newly computed results, after the completion of the z-th 
iteration of Step 2 of Algorithm GJm, is given as: 



r*(f]f r )({A} + {A} 3 / "{A} 


<M.<* k j+ 1 ) = 


r~k 


1 < k < /, 


W+l 


/+! 


(f]r r )({A} + {A} 3 )'’ 


k + 1 




r=A: 


/+! 


, 1 < A' < i, / + 2 < y < Af + 1 . 


n * 

r= 1 t r^k 


The c/i-vcctor of the computed solution x is the same as that given in Theorem 3.2. 

Proof. See Appendix III. 

From Theorem 3.3 we conclude that Algorithm GH and Algorithm GJ are equivalent to each 
other in terms of our error complexity measures. 
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Appendix I. 


Proof of Theorem 3.1. We prove by induction on i. For i =■ 1 it is true by the results dem- 
onstrated for V = 3. Assume the truth for / - 1 . Now for i we have 


(new) 

a ki 




(old) a 
% A 
Jold) 


i< k< N, 


By assumption, 


<*.<£*> = 


FT 

r—\ 


rr 

r—\ 


hence 


c/.(4r w> ) = 


cm4 ,IJ) ) 




For i + 1 < k < A' and i + 1 < j < N + 1 we have 




x of 1 ' 1 ) . 4f d ’A - x af ,J >6. 2 . 


Hence 


cKa^) 



{A} + 


C/{A} 



r—\ 


+ qc-{A } 3 




if k - j - i + 1 , 


otherwise. 


This proves the theorem. Q.T.D. 
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Appendix II. 


Proof of Theorem 3.2. We prove by backward induction on /. For / = N, 

X V = fl( a N y N+\l a NN)- 


So that 


c ^( a /V,,V+lH^} 

ch( ^‘ cK-hJ 


By Theorem 3.1 we have 


c/t( a /v, v+ 1 ) 


s r — i 

lb 

r— 1 


, ch(a NN ) = 


l n 


N - 1 

EE 

r=l 


Hence 


c/lCXyy) = 


£jvO) 

c N 


which is true. Now assuming the theorem is true for x iV , x N .. } , ... , jr i(1 , To obtain x, we first form 


Jo “ a i,N fl 

for j — N downto / + 1 do 

JjV y + 1 —flL V n~j ~~ a iJ X X j) 

and then compute jc, = f!(y N Ja tl )- By (2.3) we have 

y,w - J+ 1 = y,w-j A - a ij x A 2 - 


By using kmma 2.2 we obtain 
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ch(y s _ j+] ) = ch(y N _j){&} + 


c i 



*y*y'+ 1 ■*^/\ T 
c/y+i-^v 


({A} + {A}-y '{A} 3 . 


It is easily shown that the solution to the above equation is 


c% lV _;) : 


i - 1 

fl^ 

r=l 


^ +l "'p -({A} + {A}y 
9 + 1 - r -v 


Therefore 


e/iOv-() 


c c 



C /+ 1 e ;+ 2 ...C^ + 

c i+\C i+ 2- c N 


Now 




y.v-fi 


hence 


C/ 1 (X;) = 


f/ ■»("«) 


Since 



r=l 


/I 

the common factor n^r can 

r -1 


be cancelled and we obtain finally 


ch(Xi) = 


c i c i+l- c X 

C'C. i+ \...C N 


({A} + 
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This proves the theorem. Q.K.D. 



Appendix III. 


Proof of Theorem 3.3. we prove by induction on i I or / 1 we have 


^=/*>!2“W = 


(, 12 L 

a 22 


hence 


cKaft*>) = 


A ) 

ch(a 22 ) 


C|Ci( A } 

c 2 


Also 


(new) 


= /?($"> - air 7 x ay), 3 <j < N + 1. 




lienee 


ch(a\j ew ^) = cA(ag^){A} + c/Ka^V^^KA} 2 
= c,{A}+^-{A}-^{A} 2 

c? Cl 


c,c 2 {A} + ^{A}" 


^({A} + {A} 3 ) 


and the theorem is true. Assume now the theorem is true for all i up to / — 1. For i we have 


(new) fi(Jold) , \ = — 

a kJ + 1 a k,i+ \l a i +\ ,/+ 1 ' a. 


Jold) a 

a k,t+\ a 


'(+ 1 , 1+1 


1 < k < /'. 


1 lence 


ch (4j+l) = 


c/l (affi){A} 

ch ( a i+\,i+\) 


By the induction assumption 
18 



([]c r )({A} + {A} 3 )' - 


(old) v _ r=k 




■/+ i 


fi *> 

/•=!/*& 


, ch(a t+] 


r= 1 


Hence 


i 

ch(4 n ™l) = _ r - — - ({A} + {A} 3 )'- fc {A}, 1 < * < i. 

’ 1 


For 




a £,f+l 


X (I 


i+ 1 / 


:), 1 < k< i, 


i+2<j <N + l , 


then 


ch(a% ew) ) = cA(ag^){A} + ch(a^J^)ch(a i+ j ^) { A } 2 
i l 

(]1 C '>« A > + {A} 3 )*"*{A} ^ri c r) 


r=* 


fi ^ 

r=l,r#<: 


• + ; = * -- ((A) + {A} 3 y-*{A} {A} 2 

n* 


/+ 1 


mi*™ + + ( a)3) ' * 


r=k 


r—k 


i+ 1 


n ^ 

r=l ,r^A: 


f+1 


(f]c r )({A} + {A} 3 ) 


i-*+l 


r=A 


i+1 

ru 


since C: , i = c 


7+1 ~ W+i* 


r= 1 


This proves the first part of the theorem. For the computed solution x, we have x ( =Jl(a tNhl la lt ) y 
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hence 


r/i(.Y.-) =- V 

ch(a u ) 


Since 


N 


([~[c r )({A} + {A} 3 )" 1 


<Moi,n+i) = ' 


,V 


ch(a u ) 


n 

r— 1 jrt* 


therefore 


N 



ch( Xi ) = -Z=f— ({AH- {AlYlA}. 



r—i 


This proves the theorem. Q.E.D. 
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